Business Task

Bellabeat, a technology company that produces smart health products, aims to examine the usage patterns of one of their products to gain a better understanding of how people are utilizing their smart devices. Based on these findings, the company desires strategic recommendations for how these trends can influence their marketing approach.

Stakeholders

○ Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer.

○ Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team.

○ Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

Data Sources:

● FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.


Prepare the Data

The data is stored in various tables depending on the frequency of the observation and the type of observations. For this analysis we will be using the data gathered daily 1.dailyActivity_merged.csv 2.dailyCalories_merged.csv. 3.dailyIntensities_merged.csv 4.dailySteps_merged.csv 5.sleepDay_merged.csv 6.weightLoginfo_merged.csv

install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.1     âś” readr     2.1.4
## âś” forcats   1.0.0     âś” stringr   1.5.0
## âś” ggplot2   3.4.1     âś” tibble    3.2.1
## âś” lubridate 1.9.2     âś” tidyr     1.3.0
## âś” purrr     1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)

Reading the data

activity <- read_csv("Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
calories <- read_csv("Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, Calories
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
intensities <- read_csv("Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
## Rows: 940 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (9): Id, SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, Ve...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
steps <- read_csv("Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, StepTotal
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleepday <- read_csv("Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weight <- read_csv("Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Merging the tables using common columns

before merging we have to change the common column names in various tables to a standard name.

activity <- rename( activity, date = ActivityDate)
calories <- rename( calories, date = ActivityDay)
intensities <- rename( intensities, date = ActivityDay)
sleepday <- rename( sleepday, date = SleepDay)
steps <- rename( steps, date = ActivityDay)
weight <- rename( weight, date = Date)
activity <- mutate( activity, date = mdy(date))
sleepday <- mutate( sleepday, date = mdy_hms(date))
weight <- mutate( weight, date = mdy_hms(date)) %>% mutate(date=as_date(date))

Adding a columns for day of the week

activity <- activity %>% mutate( Weekday = weekdays(date))

we will be merging all the tables together so that all the important information is in a single table for ease of use and analysis. As we have the information of calories,steps and intensities tables in activity table, we will merge the other tables

sleep_merged <- merge(activity,sleepday,by=c("Id","date"))
weight_merged <- merge(sleep_merged,weight, by = c("Id", "date"))

now we have 3 tables i.e activity, sleep_merged, weight_merged. We will select the relevant columns from the weight_merged table.

summary(activity)
##        Id                 date              TotalSteps    TotalDistance   
##  Min.   :1.504e+09   Min.   :2016-04-12   Min.   :    0   Min.   : 0.000  
##  1st Qu.:2.320e+09   1st Qu.:2016-04-19   1st Qu.: 3790   1st Qu.: 2.620  
##  Median :4.445e+09   Median :2016-04-26   Median : 7406   Median : 5.245  
##  Mean   :4.855e+09   Mean   :2016-04-26   Mean   : 7638   Mean   : 5.490  
##  3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.:10727   3rd Qu.: 7.713  
##  Max.   :8.878e+09   Max.   :2016-05-12   Max.   :36019   Max.   :28.030  
##  TrackerDistance  LoggedActivitiesDistance VeryActiveDistance
##  Min.   : 0.000   Min.   :0.0000           Min.   : 0.000    
##  1st Qu.: 2.620   1st Qu.:0.0000           1st Qu.: 0.000    
##  Median : 5.245   Median :0.0000           Median : 0.210    
##  Mean   : 5.475   Mean   :0.1082           Mean   : 1.503    
##  3rd Qu.: 7.710   3rd Qu.:0.0000           3rd Qu.: 2.053    
##  Max.   :28.030   Max.   :4.9421           Max.   :21.920    
##  ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
##  Min.   :0.0000           Min.   : 0.000      Min.   :0.000000       
##  1st Qu.:0.0000           1st Qu.: 1.945      1st Qu.:0.000000       
##  Median :0.2400           Median : 3.365      Median :0.000000       
##  Mean   :0.5675           Mean   : 3.341      Mean   :0.001606       
##  3rd Qu.:0.8000           3rd Qu.: 4.782      3rd Qu.:0.000000       
##  Max.   :6.4800           Max.   :10.710      Max.   :0.110000       
##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
##  Min.   :  0.00    Min.   :  0.00      Min.   :  0.0        Min.   :   0.0  
##  1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:127.0        1st Qu.: 729.8  
##  Median :  4.00    Median :  6.00      Median :199.0        Median :1057.5  
##  Mean   : 21.16    Mean   : 13.56      Mean   :192.8        Mean   : 991.2  
##  3rd Qu.: 32.00    3rd Qu.: 19.00      3rd Qu.:264.0        3rd Qu.:1229.5  
##  Max.   :210.00    Max.   :143.00      Max.   :518.0        Max.   :1440.0  
##     Calories      Weekday         
##  Min.   :   0   Length:940        
##  1st Qu.:1828   Class :character  
##  Median :2134   Mode  :character  
##  Mean   :2304                     
##  3rd Qu.:2793                     
##  Max.   :4900
summary(sleep_merged)
##        Id                 date              TotalSteps    TotalDistance   
##  Min.   :1.504e+09   Min.   :2016-04-12   Min.   :   17   Min.   : 0.010  
##  1st Qu.:3.977e+09   1st Qu.:2016-04-19   1st Qu.: 5206   1st Qu.: 3.600  
##  Median :4.703e+09   Median :2016-04-27   Median : 8925   Median : 6.290  
##  Mean   :5.001e+09   Mean   :2016-04-26   Mean   : 8541   Mean   : 6.039  
##  3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.:11393   3rd Qu.: 8.030  
##  Max.   :8.792e+09   Max.   :2016-05-12   Max.   :22770   Max.   :17.540  
##  TrackerDistance  LoggedActivitiesDistance VeryActiveDistance
##  Min.   : 0.010   Min.   :0.0000           Min.   : 0.00     
##  1st Qu.: 3.600   1st Qu.:0.0000           1st Qu.: 0.00     
##  Median : 6.290   Median :0.0000           Median : 0.57     
##  Mean   : 6.034   Mean   :0.1131           Mean   : 1.45     
##  3rd Qu.: 8.020   3rd Qu.:0.0000           3rd Qu.: 2.37     
##  Max.   :17.540   Max.   :4.0817           Max.   :12.54     
##  ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
##  Min.   :0.0000           Min.   :0.010       Min.   :0.0000000      
##  1st Qu.:0.0000           1st Qu.:2.540       1st Qu.:0.0000000      
##  Median :0.4200           Median :3.680       Median :0.0000000      
##  Mean   :0.7502           Mean   :3.807       Mean   :0.0009201      
##  3rd Qu.:1.0400           3rd Qu.:4.930       3rd Qu.:0.0000000      
##  Max.   :6.4800           Max.   :9.480       Max.   :0.1100000      
##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
##  Min.   :  0.00    Min.   :  0.00      Min.   :  2.0        Min.   :   0.0  
##  1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:158.0        1st Qu.: 631.0  
##  Median :  9.00    Median : 11.00      Median :208.0        Median : 717.0  
##  Mean   : 25.19    Mean   : 18.04      Mean   :216.9        Mean   : 712.2  
##  3rd Qu.: 38.00    3rd Qu.: 27.00      3rd Qu.:263.0        3rd Qu.: 783.0  
##  Max.   :210.00    Max.   :143.00      Max.   :518.0        Max.   :1265.0  
##     Calories      Weekday          TotalSleepRecords TotalMinutesAsleep
##  Min.   : 257   Length:413         Min.   :1.000     Min.   : 58.0     
##  1st Qu.:1850   Class :character   1st Qu.:1.000     1st Qu.:361.0     
##  Median :2220   Mode  :character   Median :1.000     Median :433.0     
##  Mean   :2398                      Mean   :1.119     Mean   :419.5     
##  3rd Qu.:2926                      3rd Qu.:1.000     3rd Qu.:490.0     
##  Max.   :4900                      Max.   :3.000     Max.   :796.0     
##  TotalTimeInBed 
##  Min.   : 61.0  
##  1st Qu.:403.0  
##  Median :463.0  
##  Mean   :458.6  
##  3rd Qu.:526.0  
##  Max.   :961.0
weight_merged <- weight_merged %>% select(Id,date,TotalSteps,TotalDistance,TrackerDistance , LoggedActivitiesDistance, VeryActiveDistance,ModeratelyActiveDistance, LightActiveDistance, SedentaryActiveDistance, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes, Calories, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed, WeightKg, BMI,Weekday)
summary(weight_merged)
##        Id                 date              TotalSteps    TotalDistance   
##  Min.   :1.504e+09   Min.   :2016-04-12   Min.   :  356   Min.   : 0.250  
##  1st Qu.:6.962e+09   1st Qu.:2016-04-18   1st Qu.: 5780   1st Qu.: 3.825  
##  Median :6.962e+09   Median :2016-04-28   Median :10524   Median : 6.960  
##  Mean   :6.398e+09   Mean   :2016-04-26   Mean   : 9687   Mean   : 6.523  
##  3rd Qu.:6.962e+09   3rd Qu.:2016-05-03   3rd Qu.:12484   3rd Qu.: 8.730  
##  Max.   :6.962e+09   Max.   :2016-05-12   Max.   :20031   Max.   :13.240  
##  TrackerDistance  LoggedActivitiesDistance VeryActiveDistance
##  Min.   : 0.250   Min.   :0.0000           Min.   :0.000     
##  1st Qu.: 3.825   1st Qu.:0.0000           1st Qu.:0.000     
##  Median : 6.960   Median :0.0000           Median :1.200     
##  Mean   : 6.464   Mean   :0.2867           Mean   :1.727     
##  3rd Qu.: 8.610   3rd Qu.:0.0000           3rd Qu.:3.305     
##  Max.   :13.240   Max.   :4.0817           Max.   :5.980     
##  ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
##  Min.   :0.0000           Min.   :0.25        Min.   :0.000          
##  1st Qu.:0.1900           1st Qu.:2.76        1st Qu.:0.000          
##  Median :0.7600           Median :3.91        Median :0.000          
##  Mean   :0.9083           Mean   :3.88        Mean   :0.006          
##  3rd Qu.:1.6800           3rd Qu.:4.88        3rd Qu.:0.000          
##  Max.   :2.3900           Max.   :7.04        Max.   :0.110          
##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
##  Min.   :  0.00    Min.   : 0.00       Min.   : 32.0        Min.   : 127.0  
##  1st Qu.:  0.00    1st Qu.: 3.50       1st Qu.:197.0        1st Qu.: 635.5  
##  Median : 18.00    Median :15.00       Median :240.0        Median : 689.0  
##  Mean   : 27.49    Mean   :18.37       Mean   :236.5        Mean   : 688.5  
##  3rd Qu.: 42.00    3rd Qu.:33.50       3rd Qu.:286.0        3rd Qu.: 736.0  
##  Max.   :200.00    Max.   :42.00       Max.   :369.0        Max.   :1121.0  
##     Calories    TotalSleepRecords TotalMinutesAsleep TotalTimeInBed 
##  Min.   : 928   Min.   :1.000     Min.   :115.0      Min.   :129.0  
##  1st Qu.:1852   1st Qu.:1.000     1st Qu.:399.0      1st Qu.:420.0  
##  Median :2039   Median :1.000     Median :442.0      Median :455.0  
##  Mean   :2052   Mean   :1.086     Mean   :430.3      Mean   :449.8  
##  3rd Qu.:2168   3rd Qu.:1.000     3rd Qu.:472.5      3rd Qu.:494.0  
##  Max.   :4552   Max.   :3.000     Max.   :630.0      Max.   :679.0  
##     WeightKg           BMI          Weekday         
##  Min.   : 52.60   Min.   :22.65   Length:35         
##  1st Qu.: 61.20   1st Qu.:23.89   Class :character  
##  Median : 61.50   Median :24.00   Mode  :character  
##  Mean   : 64.17   Mean   :24.83                     
##  3rd Qu.: 61.90   3rd Qu.:24.17                     
##  Max.   :133.50   Max.   :47.54

Analysis

First, we will analyze the activity table.

ggplot(activity, aes(x = TotalSteps , y = Calories)) + geom_point()+geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(activity, aes(x = TotalSteps , y = TotalDistance)) + geom_point()+geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

From the above two graphs, we can observe that there is a strong positive correlation between TotalSteps and TotalDistance, which is expected since walking is a major contributor to both these variables. However, we cannot see the same correlation between TotalSteps and TotalCalories, as there can be other physical activities that burn calories but do not involve walking, such as weightlifting or yoga. Therefore, we can infer that relying solely on step counts may not be the most accurate measure of total physical activity, and it may be necessary to consider additional factors such as exercise intensity or heart rate to get a more complete picture of energy expenditure.


total <- (sum(activity$SedentaryMinutes)+sum(activity$LightlyActiveMinutes)+sum(activity$FairlyActiveMinutes) + sum(activity$VeryActiveMinutes))/100
sedentary_percentage <- sum(activity$SedentaryMinutes)/total
lightlyActive_percentage <- sum(activity$LightlyActiveMinutes)/total
fairlyActive_percentage <- sum(activity$FairlyActiveMinutes)/total
veryActive_percentage <- sum(activity$VeryActiveMinutes)/total
percentage <- data.frame(
  level=c("Sedentary", "Lightly Active", "Fairly Active", "Very Active"),
  percentage=c( sedentary_percentage,lightlyActive_percentage,fairlyActive_percentage,veryActive_percentage)
)

pie(percentage$percentage,labels= paste(percentage$level,"-",round(percentage$percentage,1),"%"),col = rainbow(length(percentage$percentage)),main="Percentage of Various Activity Levels")

From the pie chart, we can see that users as a whole spent 81.3% of their activity in sedentary state and only 1.74% in very active state.


weekdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")

activity$Weekday <- factor(activity$Weekday, levels = weekdays)

ggplot(data = activity, aes(x = Weekday, y = TotalSteps, fill = Weekday)) + 
  geom_bar(stat = "identity") +
  ylab("Total Steps")

Weekdays have higher total steps than weekends: The chart shows that the total steps are generally higher on weekdays (Monday to Friday) than on weekends (Saturday and Sunday). This could suggest that users tend to be more active during the week, possibly due to work-related activities or other weekday routines.

Total steps are lowest on weekends: The chart shows that the total steps are lowest on weekends, particularly on Sundays. This could suggest that users tend to be less active on weekends, possibly due to leisure activities or rest days.


now, we will analyze sleep data

ggplot(data= sleep_merged, aes(x = TotalMinutesAsleep, y = Calories)) +
  geom_col(size = 3) + geom_smooth(col="red")+
  theme_classic()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

This chart shows the relation between the activity level and sleep quality , with some activity (even very little activity) we see a great increase of normal sleepers(7 to 8 hrs).

The important insight here is the decrease of over sleepers (more than 8h) in the most calorie burnt category.


ggplot(weight_merged, aes(x = TotalSteps, y = WeightKg)) +
  geom_point(alpha = 0.5) + geom_smooth(col="pink") +
  labs(x = "Activity Level (Total Steps)", y = "Weight (kg)") +
  ggtitle("Relationship between activity level and weight for Bellabeat users")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

There is a positive correlation between physical activity and weight loss.


Reccomendations

1.Women tend to be highly active during the day, but have a high proportion of sedentary time as well. This suggests that women are fitting in physical activity around their busy schedules, but may be spending too much time sitting.

This insight provides an opportunity for Bellabeat to develop new features that encourage women to take more breaks and engage in light activity throughout the day, such as reminders to stand up and stretch or to take a short walk. By addressing this specific need of their target market, Bellabeat can differentiate themselves from competitors and provide value to their customers.

Furthermore, this insight highlights the importance of understanding the unique behaviors and needs of a specific target market. By tailoring product features and marketing strategies to the specific needs of their customers, companies can create more value and build stronger relationships with their customers.

2.Women who had longer and more consistent sleep patterns tended to be more active during the day. This suggests that there is a strong connection between sleep quality and physical activity, and that addressing one area can lead to improvements in the other.

This insight provides an opportunity for Bellabeat to develop new features that help customers improve their sleep quality and consistency, such as sleep tracking and analysis, personalized sleep recommendations, and guided meditations or relaxation exercises. By focusing on both sleep and physical activity, Bellabeat can provide a more holistic approach to wellness and differentiate themselves from competitors.

Additionally, this insight highlights the importance of taking a comprehensive approach to understanding customer behavior and needs. By analyzing multiple aspects of customers’ lives, such as activity levels, sleep patterns, stress levels, and menstrual cycles, Bellabeat was able to gain a more complete understanding of their customers and identify opportunities for improving their product offerings.

Image by redgreystock on Freepik